Is this NE tagger getting old?

نویسندگان

  • Cristina Mota
  • Ralph Grishman
چکیده

This paper focuses on the influence of changing the text time frame on the performance of a named entity tagger. We followed a twofold approach to investigate this subject: on the one hand, we analyzed a corpus that spans 8 years, and, on the other hand, we assessed the performance of a name tagger trained and tested on that corpus. We created 8 samples from the corpus, each drawn from the articles for a particular year. In terms of corpus analysis, we calculated the corpus similarity and names shared between samples. To see the effect on tagger performance, we implemented a semi-supervised name tagger based on co-training; then, we trained and tested our tagger on those samples. We observed that corpus similarity, names shared between samples, and tagger performance all decay as the time gap between the samples increases. Furthermore, we observed that the corpus similarity and names shared correlate with the tagger F-measure. These results show that named entity recognition systems may become obsolete in a short period of time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

1998 Hub-4 Information Extraction Evaluation

This paper documents the Information Extraction Named-Entity Evaluation (IE-NE), one of the new spokes added to the DARPA-sponsored 1998 Hub-4 Broadcast News Evaluation. This paper discusses the information extraction task as posed for the 1998 Broadcast News Evaluation. This paper reviews the evaluation metrics, the scoring process, and the test corpus that was used for the evaluation. Finally...

متن کامل

A Case Restoration Approach to Named Entity Tagging in Degraded Documents

This paper describes a novel approach to named entity (NE) tagging on degraded documents. NE tagging is the process of identifying salient text strings in unstructured text, corresponding to names of people, places, organizations, times/dates, etc. Although NE tagging is typically part of a larger information extraction process, it has other applications, such as improving search in an informat...

متن کامل

NYU/CRL QA System for QAC-2

The input of this component is the question sentence and the output is the list of keywords, NE types of the answer, and several kinds of minor information. The sentence is first analyzed by morphological analyzer, JUMAN [JUMAN homepage], and our NE tagger. Words are concatenated if it is a sequence of noun-prefix, nouns and noun-suffix or a NE expression, while the individual words remains (wh...

متن کامل

Definition, Dictionaries and Tagger for Extended Named Entity Hierarchy

The tagging of Named Entities, the names of particular things or classes, is regarded as an important component technology for many NLP applications. The first Named Entity set had 7 types, organization, location, person, date, time, money and percent expressions. Later, in the IREX project artifact was added and ACE added two, GPE and facility, to pursue the generalization of the technology. H...

متن کامل

Updating a Name Tagger Using Contemporary Unlabeled Data

For many NLP tasks, including named entity tagging, semi-supervised learning has been proposed as a reasonable alternative to methods that require annotating large amounts of training data. In this paper, we address the problem of analyzing new data given a semi-supervised NE tagger trained on data from an earlier time period. We will show that updating the unlabeled data is sufficient to maint...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008